
Name:
The purpose of this proposal is to analyze the Zillow housing data for the pacific coast, more specifically California, Washington, and Oregon. By seeking an answer to 1) how different the housing markets are between these three states so then to gain further understanding on 2) where in each state would be the most affordable and where would be the most expensive county to buy a house, 3) to determine when is the right time to buy a house, focusing in on if a certain season (Summer, Autumn, Winter, or Spring) often sees a decrease or increase in home purchases, and 4) briefly look at the housing bubble so as to have a more comprehensive idea of the changes within the housing market.
By comparing the estimated average housing prices for each state versus their true average housing prices it was clear California was the most expensive to live in, much like was anticipated. Taking it a step further, it was hypothesized that the population density size of a county could impact the average housing price within that county and while that appeared true in both Washington and Oregon, California’s representation of the data did not support this claim. If surrounding geographical factors were taken into consideration (i.e. location of state parks, and Silicon Valley, SF) California’s results would have possibly not been as surprising as they initially were. With the limited prior knowledge or real estate trends, it was expected that Summer would be a great time to view a house on the market, but may be more competitive and less cost effective, while Fall may be a great time to buy a house since open houses may be less frequented and therefore less competitors trying to outbid one another.

Almost everyone would consider buying a house to be an important milestone in life, and with the fact that we will be graduating from UCSB soon the thought of buying a perfect home sounds intriguing due to it being one of the next chapters of our lives. By using data found on Zillow, a popular real estate website, we will analyze the housing market of California, Oregon, and Washington to forecast average housing prices within each state, and determine the best counties to buy homes in, when is the best time to buy a house, as well as understand how other specified factors influence the decision process of buying a house.
import pandas as pd
import numpy as np
import math
##Must "conda install plotly" on terminal before proceeding
import matplotlib.pyplot as plt
import seaborn as sns
df_sell_price= pd.read_csv('City_Zhvi_3bedroom.csv', encoding='latin-1')
Zillow contains housing data throughout the United States, by concentrating on the housing data for California, Oregon, and Washington we focus our analysis on data that is more relevant to us as residents in California so as to gain better understanding of the varying housing markets on the West Coast. We also agreed we wanted to look at a more recent representation of the housing market after the crash, so we focus our data to contain the housing prices from the start of 2010 (January) to the end of 2018 (December).
CA = df_sell_price.loc[df_sell_price["State"]=="CA"]
WA = df_sell_price.loc[df_sell_price["State"]=="WA"]
OR = df_sell_price.loc[df_sell_price["State"]=="OR"]
df_pacific = pd.concat([CA,WA,OR])
pacific_part3 = df_pacific.loc[:,"2009-01":"2018-12"]
holder1 = df_pacific.iloc[:,0:6]
holder = df_pacific.loc[:,"2010-01":"2018-12"]
df_pacific = pd.concat([holder1, holder], axis = 1)
After narrowing down our data to California, Oregon, and Washington from 2010 to 2018, we use bootstrapping to estimate the average housing price for each state and compare our estimates to the true average housing prices for the states. By first defining a function that sampling with replacement, we then took the average by the rows, which in our case was by the zip codes. We then took the average of those computed means and store the results to an empty array. For each state we then utilized the bootstrap function and took the average of the outputted array and rounded our results up. This gave us our estimated housing price for each state. In order to get the true housing price, we then took the average of the states’ rows again, and averaged those means.
def bootstrap_house_price_mean(data, N):
array = np.array(np.zeros(N))
for i in range(N):
## Sample on our sample with replacement
data = data.iloc[np.random.choice(len(data), len(data)),:]
## Isolate our numer data and gather a mean of the selling prices
numeric_data = data.loc[:,"2010-01":"2018-12"]
bootstrap_rowmeans = np.mean(numeric_data, axis = 1)
## Append Mean to array
array[i] = np.mean(bootstrap_rowmeans)
## return array
return array
##Washington Comparison
WA_mean = math.ceil(np.mean(bootstrap_house_price_mean(WA, 1000)))
Estimated_WA_Mean = math.ceil(np.mean(np.mean(WA.loc[:,"2010-01":"2018-12"], axis = 1)))
print("The true mean for Washington is $%s" %Estimated_WA_Mean, ", while the estimated mean is $%s" %WA_mean,"given a boostrap sample of size:",len(WA),".")
#California Comparison
CA_mean = math.ceil(np.mean(bootstrap_house_price_mean(CA, 1000)))
Estimated_CA_Mean = math.ceil(np.mean(np.mean(CA.loc[:,"2010-01":"2018-12"], axis = 1)))
print("The true mean for California is $%s" %Estimated_CA_Mean, ", while the estimated mean is $%s" %CA_mean,"given a boostrap sample of size:",len(CA),".")
#Oregon Comparison
OR_mean = math.ceil(np.mean(bootstrap_house_price_mean(OR, 1000)))
Estimated_OR_Mean = math.ceil(np.mean(np.mean(OR.loc[:,"2010-01":"2018-12"], axis = 1)))
print("The true mean for Oregon is $%s" %Estimated_OR_Mean, ", while the estimated mean is $%s" %OR_mean,"given a boostrap sample of size:",len(OR),".")
CA_crash = CA.loc[:,"2006-06":"2012-12"]
When we compare these results to the true housing prices, we see that we have under estimated for all the states. We could justify this under-estimation due to the fact that there can be outliers in housing prices data. For example, our California estimate was way off, and this could be because California is 1) huge in comparison to Oregon and Washington, and 2) has varied housing markets, meaning the cost to buy a house in San Francisco or Los Angeles would be greater than say buying a house in Fresno or Bakersfield. While our estimated average housing price for California was far off, our estimated average housing price for Oregon was fairly good with our estimate being 331,773' and the true value being 229,240. We see that our estimate for Washington was the best, with it being fairly close to the true price, with us off a little less than 13,000.
Lastly, we took the liberty of doing bootstrap sampling on our pacific data (which contains all three states together) so as to see what our estimated average house price of the pacific coast is and compare it to the real average price. As we see our estimate is quite off, much like our results for California. We note that the sample size for both California and for our Pacific data is much larger than our sample size for either Oregon or Washington and feel this could be a contributing factor. We also realize that since our Pacific data incorporates all the housing prices for all three states, we will be affected by the extreme outlier values that we suspect impacted our California results.
#Pacific Camparison
df_pacific_mean = math.ceil(np.mean(bootstrap_house_price_mean(df_pacific, 1000)))
Estimated_pacific_Mean = math.ceil(np.mean(np.mean(df_pacific.loc[:,"2010-01":"2018-12"], axis = 1)))
print("The true mean for United States Pacific Coast is $%s" %df_pacific_mean, ", while the estimated mean is $%s" %Estimated_pacific_Mean,"given a boostrap sample of size:",len(df_pacific),".")
Also, when we compare the true pacific average housing price with our calculated estimates we achieved through bootstrapping, we see that our California estimate is the closest, with Washington second best and Oregon the farthest from the true Pacific average price with a difference of over $250,000. Again, these results makes sense due to the fact that California contains the largest sample size and therefore contributes more of the data used in the pacific estimate, and while Washington has the lowest sample size, we’d expect it to be farther off than the pacific average.

!pip install geopandas
!pip install pyshp
!pip install shapely
!pip install plotly-geo
##Must "conda install plotly" on terminal before proceeding
Now that we know what the true average housing price is for each state, we want to see how the housing prices compare by county within each state. Our results above showed us that with the use of bootstrapping we were able to predict the average housing cost for Washington really well, and California less accurately. Since we already mentioned size may have contributed to this discrepancy, we would like to build on this idea and think about how the counties within each state contribute to the results we got. We believe that counties with higher population density size may also be counties with higher housing costs. While in the bootstrap part we averaged based off of zipcodes we understand that when we look at the county average housing price, we are still taking the average of the zipcodes that reside within the county. Since Washington was our best estimate, we could hypothesis that the average housing cost by counties within Washington are not so different and the population density size between counties are also fairly consistent. In contrast, we’d expect California to have a very diverse range of population density sizes when it comes to its counties, and in effect expect those with higher populations to also have an increase in housing cost, due to the fact that supply may be limited in these areas but the demand will be high and increase the prices.
df_sample = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/minoritymajority.csv')
df_sample_ca = df_sample[df_sample['STNAME'] == 'California']
df_sample_ca = df_sample_ca.reset_index()
CA = CA.reset_index()
df_sample_or = df_sample[df_sample['STNAME'] == 'Oregon']
df_sample_or = df_sample_or.reset_index()
OR = OR.reset_index()
df_sample_wa = df_sample[df_sample['STNAME'] == 'Washington']
df_sample_wa = df_sample_wa.reset_index()
WA = WA.reset_index()
#state_name is of type string
def state_setup(state_df, plot_state_df):
#Set up average values
state_df["average"] = round(np.mean(state_df.iloc[:,6:289], axis = 1),2)
plot_state_df["Average_County_Sale_Price"] = np.zeros((len(plot_state_df)))
#for loop for mapping dataframes
row_num = 0
for i in range(len(state_df)):
for j in range(len(plot_state_df)):
if(state_df.iloc[i,5] == plot_state_df.iloc[j,3]):
row_num = j
plot_state_df.iloc[row_num,-1] = round(state_df.iloc[i,-1])
return plot_state_df
plot_CA = state_setup(CA, df_sample_ca)
plot_OR = state_setup(OR, df_sample_or)
plot_WA = state_setup(WA, df_sample_wa)
In order to compare the Average House Price in Counties we decided to formulate the population density using our census data from 'df_sample' in order to see if there's any difference between the housing market and the area of each county. Below is a function that creates the density population for all three states
def add_density(state_data):
total = 0
##Extract the total population
for i in range(state_data.shape[0]):
total += state_data.iloc[i,4]
##Create new column
state_data["Density"] = np.zeros(state_data.shape[0])
##Change 0 to correct population density
for i in range(state_data.shape[0]):
state_data.iloc[i,-1] = state_data.iloc[i,4]/total
##Resize density to plot {multiply by 1000}
state_data["Density"] = state_data["Density"]*1000
add_density(plot_CA)
add_density(plot_OR)
add_density(plot_WA)
import plotly.figure_factory as ff
import pandas as pd
#Oregon Block
values = plot_OR["Average_County_Sale_Price"].tolist()
fips = plot_OR['FIPS'].tolist()
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = ["#030512","#1d1d3b","#323268","#3d4b94","#3e6ab0",
"#4989bc","#60a7c7","#85c5d3","#b7e0e4","#eafcfd"]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['Oregon'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Average House Price from 2010-2018',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=False,
)
fig.layout.template = None
fig.show()
values = plot_OR["Density"].tolist()
fips = plot_OR['FIPS'].tolist()
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = ["#030512","#1d1d3b","#323268","#3d4b94","#3e6ab0",
"#4989bc","#60a7c7","#85c5d3","#b7e0e4","#eafcfd"]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['Oregon'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Population by County in Density rescaled to times 1000',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=False,
)
fig.layout.template = None
fig.show()
By using the average housing price and obtaining state census data we are able to visualize the varying housing costs cross the counties for each state. We believe the best place to buy a house that would be a highly populated county using our density, but also is also at or below that state average housing price. For Oregon we see that Multnomah, Clackamas and Washington Counties have higher population area but their housing prices are around 202,475- 303,712 dollars, obviously hitting our standard. The fact that these counties are highly populated makes sense because we know that Portland is within the Multnomah area and that is a big city within the state.
#California Block
values = plot_CA["Average_County_Sale_Price"].tolist()
fips = plot_CA['FIPS'].tolist()
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = ["#030512","#1d1d3b","#323268","#3d4b94","#3e6ab0",
"#4989bc","#60a7c7","#85c5d3","#b7e0e4","#eafcfd"]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['California'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Average House Price from 2010-2018',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=False,
)
fig.layout.template = None
fig.show()
values = plot_CA["Density"].tolist()
fips = plot_CA['FIPS'].tolist()
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = ["#030512","#1d1d3b","#323268","#3d4b94","#3e6ab0",
"#4989bc","#60a7c7","#85c5d3","#b7e0e4","#eafcfd"]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['California'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Population by County in Density rescaled to times 1000',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=False,
)
fig.layout.template = None
fig.show()
When we check for California, we see that the most densely populated counties are in Southern California, more specifically they are Los Angeles, San Bernardino, Riverside, Orange and San Diego. Despite these highly populated areas the most expensive place to live was in Marin County within the Bay Area, this could possibly be explained due to the proximity to San Francisco so people that live in Marin may be able to commute to SF for work or pleasure but are outside the hectic city life. When considering a highly populated area but costs at or below California Housing average, Los Angeles county would be the best choice, and we see the Bay Area is quite expensive for the minimal population size.
#Washington Block
values = plot_WA["Average_County_Sale_Price"].tolist()
fips = plot_WA['FIPS'].tolist()
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = ["#030512","#1d1d3b","#323268","#3d4b94","#3e6ab0",
"#4989bc","#60a7c7","#85c5d3","#b7e0e4","#eafcfd"]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['Washington'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Average House Price from 2010-2018',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=False,
)
fig.layout.template = None
fig.show()
values = plot_WA["Density"].tolist()
fips = plot_WA['FIPS'].tolist()
endpts = list(np.mgrid[min(values):max(values):9j])
colorscale = ["#030512","#1d1d3b","#323268","#3d4b94","#3e6ab0",
"#4989bc","#60a7c7","#85c5d3","#b7e0e4","#eafcfd"]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['Washington'], show_state_data=True,
colorscale=colorscale, binning_endpoints=endpts, round_legend_values=True,
plot_bgcolor='rgb(229,229,229)',
paper_bgcolor='rgb(229,229,229)',
legend_title='Population by County in Density rescaled to times 1000',
county_outline={'color': 'rgb(255,255,255)', 'width': 0.5},
exponent_format=False,
)
fig.layout.template = None
fig.show()
Lastly, comparing Washington’s population by county density to its average housing rate we see that the most populated counties are King, and then Pierce and Snohomish and we see that the most expensive housing prices are within King County as well. Again, this makes sense due to the fact that Seattle is a bustling major city so we would expect the housing there to be more expensive. As for the counties that we think would be best to buy a house in, we can conclude Snohomish followed by the Pierce county. We see that Snohomish county is highly populated, and housing price range from the 154,903 – 309,807 dollar category.
As we expected, the most highly populated counties within the states also had the highest housing rates (with exception to California), but we were still able to find highly populated counties that were within the ball park of the state average.

So far, we have determined the average housing price for California, Washington and Oregon as well as visualize differences in housing prices and population density with respect to counties. We have come to grasp a clearer idea of where to look for an affordable house but we are curious as to when would be a good time to buy a house? Basing our sense of seasons off of the solstices and the equinoxes we define Winter to be from December 21st – March 20th, Spring from March 21st – June 20th, Summer to June 21st – September 20th, and lastly Autumn from September 21st – December 20th. Now after clearly defining what months constitute for which season, we hypothesis that Autumn or Winter would be the best time to buy a house primarily because we think December would be the best month. Noting that it starts getting colder and rainier in all states during this time we would expect the number of competitive home buyers to drop therefore dropping the price of the houses to compensate for the lower demand. We would think Summer could be the worst time to buy because the number of people attending open houses may increase due to the better whether therefore leading to an increase in the demand of the houses we are most interested in. By doing a box and whiskers plot as well as a violin plot for each state we are able to clearly see which season has the lowest to highest prices, as well as the span of the frequency of number of house for varying prices offered within each season based off of the tally marks within our violin plots.
def season_split(state, season):
##state: type dataframe, of the state you want to split {CA,OR,WA}
##season: string of season {"winter","spring", "summer", "fall"}
state_data = state.loc[:,"2010-01":"2018-12"]
##Intialize array
indicies = np.array(np.zeros(27))
##Grab correct season
if (season == "winter"):
start = 0
if (season == "spring"):
start = 3
if (season == "summer"):
start = 6
if (season == "fall"):
start = 9
i = start
location = 0
j = 0
##for loop for column numbers
while (i < state_data.shape[1]):
while (j < 3):
indicies[location] = i+j
location += 1
j += 1
j = 0
i += 12
## make array for titles
state_title = state.iloc[:,[2,3,5]]
##make array for data only
state_data = state_data.iloc[:,indicies]
##return correct array
return pd.concat([state_title,state_data], axis = 1)
The code below is the average house market in months from 2010-2019
years_09_18 = CA.iloc[:,160:280]
#month has to be initialized as an empty month
month = pd.DataFrame()
def get_month_data(month,years,name,month_index):
month = pd.DataFrame()
month_name = str(name)
month_means = pd.DataFrame()
for i in range(12):
month = pd.concat([month,years.iloc[:,(i*12+(month_index-1)):(i*12+month_index)]],axis = 1)
month_means =pd.DataFrame({month_name: np.mean(month)})
month_means.set_index([pd.Series([2009,2010,2011,2012,2013,2014,2015,2016,2018,2019])],inplace=True)
return month_means
january = get_month_data(month,pacific_part3,"January",1)
february = get_month_data(month,pacific_part3,"February",2)
march = get_month_data(month,pacific_part3,"March",3)
april = get_month_data(month,pacific_part3,"April",4)
may = get_month_data(month,pacific_part3,"May",5)
june = get_month_data(month,pacific_part3,"June",6)
july = get_month_data(month,pacific_part3,"July",7)
august = get_month_data(month,pacific_part3,"August",8)
september = get_month_data(month,pacific_part3,"September",9)
october = get_month_data(month,pacific_part3,"October",10)
november = get_month_data(month,pacific_part3,"November",11)
december = get_month_data(month,pacific_part3,"December",12)
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.
import seaborn as sns
%matplotlib inline
CA_wint = season_split(CA, "winter").mean().values
CA_spring = season_split(CA, "spring").mean().values
CA_summer = season_split(CA, "summer").mean().values
CA_fall = season_split(CA, "fall").mean().values
Seasons_CA = {"winter":CA_wint,
"Spring":CA_spring,
"Summer":CA_summer,
"Fall":CA_fall}
plt.figure(figsize=(15,5))
plt.subplot(1,2,1)
test_CA = pd.DataFrame((Seasons_CA))
sns.boxplot(data=test_CA,palette='rainbow',orient='h')
plt.subplot(1,2,2)
sns.violinplot(data=test_CA,palette='rainbow')
sns.swarmplot(data=test_CA)
When we look at the seasonal analysis for California, we see that Winter seems to be the best time to buy a house, due to the fact the box for winter is comparatively smaller than all the other seasons. We also see that Fall is the worst time for us to buy a house with the average cost of a house in Fall being much larger than any of the other average costs for any of the other seasons. In regards to our violin plot for California, we see that Winter has a larger shape especially in the 400,000 dollar range while in Summer and Fall the shape remains fairly constant. This shows us that we have higher probability of buying a home in winter for around 400,000 dollars than any of the other seasons, and the odds of buying a home is Autumn is fairly constant despite the price.
OR_wint = season_split(OR, "winter").mean().values
OR_spring = season_split(OR, "spring").mean().values
OR_summer = season_split(OR, "summer").mean().values
OR_fall = season_split(OR, "fall").mean().values
Seasons = {"winter":OR_wint,
"Spring":OR_spring,
"Summer":OR_summer,
"Fall":OR_fall}
plt.figure(figsize=(15,5))
plt.subplot(1,2,1)
test_OR = pd.DataFrame((Seasons))
sns.boxplot(data=test_OR,palette='rainbow',orient='h')
plt.subplot(1,2,2)
sns.violinplot(data=test_OR,palette='rainbow')
sns.swarmplot(data=test_OR)
We see that the results for Oregon, are fairly consistent with the results we obtained for California. We see again that Winter is a better time for use to buy an affordable house. Again, Fall is the most expensive time to buy, but we do see that Spring also has a fairly high probability of buying a home around 200,000 dollars, and as denoted by the tick marks may even have more variety of choice when it comes to choosing a house.
WA_wint = season_split(WA, "winter").mean().values
WA_spring = season_split(WA, "spring").mean().values
WA_summer = season_split(WA, "summer").mean().values
WA_fall = season_split(WA, "fall").mean().values
Seasons_WA = {"winter":WA_wint,
"Spring":WA_spring,
"Summer":WA_summer,
"Fall":WA_fall}
plt.figure(figsize=(15,5))
plt.subplot(1,2,1)
test_WA = pd.DataFrame((Seasons_WA))
sns.boxplot(data=test_WA,palette='rainbow',orient='h')
plt.subplot(1,2,2)
sns.violinplot(data=test_WA,palette='rainbow')
sns.swarmplot(data=test_WA)
Lastly, Washington data compliments the results the results of the last two states. We see from our violin plots that winter is the best time giving us higher probability of finding a house ranging from 250,000 to 300,000 dollars. Also, Spring does have more variety in the amount of houses offered at varying prices.
It is clear that Winter (December 21st – March 20th) is a great time to buy a house, but in the same regards Spring is favorable especially we would like to have more options to choose from in the housing market. While Spring has minimal increase in housing prices, it also appears to have more houses on the market spread throughout the varying housing prices.
pacific_month_avg = pd.concat([january,february,march,april,may,june,july,august,september,october,november,december],axis = 1)
pacific_month_avg.index.name='year'
pacific_month_avg
With these results in mind, we wanted to do a comparative of the months, since we saw roughly the same results for all three states, we did this analysis on the Pacific Coast dataset. We see that May, April, and March appear to be the best months to buy with May having the lowest average cost of all the months. This further solidifies our conclusion that Spring would be overall the best season to buy a house.
plt.figure(figsize=(15,10))
sns.boxplot(data=pacific_month_avg,palette='rainbow',orient='h')

Despite figuring out the most cost effective places to buy a house within California, Washington, and Oregon as well as the best season, we are still apprehensive in regards to if buying right now or in the next few years would be a sound investment. We remember the housing bubble that occurred from early 2006 to 2012. In early 2006, the housing prices peaked and then resulted in a decline until it hit it’s lowest in 2012, and even before that in 2008 had the largest price drop in history which resulted in affecting the 2007-2009 recession (Housing Bubble). We want to see if looking at the average housing price from 2000 -2019 can give us some indication of whether or not we are in a peak or decline in the housing market currently, so as to have a better understanding of if now or the immediate future is the best time to buy.
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(250,50), dpi=80, facecolor='w',edgecolor='k')
mean_CA = np.mean(CA.iloc[:,56:290])
year_month=(np.mean(CA.iloc[:,56:290]).index).tolist()
y_pos = np.arange(len(year_month))
plt.plot(year_month, mean_CA,color='mediumvioletred',linewidth=10)
plt.title("average home prices (2000-2019)",fontsize=12)
plt.xticks(y_pos, year_month,fontsize=30, rotation=90)
#plt.xlabel("date")
plt.ylabel("average house price")
plt.show()
From our Zillow data, we see that there was an increase from 2000 to early 2006, and we can also see the sharp decline in prices from 2006-2012. After being able to identify the housing bubble incident, we can see that the housing market has been steadily climbing higher and higher and peak has even surpassed the peak we witnessed in 2006. We also see at the tail a faint decrease, leading us to believe that the housing market is or will in the near future experience a fall again. We understand that there are a multitude of factors that go into creating an impact on the housing market so it would be difficult to try to forecast when and for how long we may experience this drop, therefore we leave it say that we believe the market may continue to decrease, and when more data becomes available to us we may be able to create a more definitive answer.
When analyzing the house market it was challenging to filter and find data since most websites only release a limited amount. Our Zillow data had county means by time series which was a challenge to get any information for that. But in conclusion we were able to get the population density from census and approximate some of our limited data. In addition, When it comes to buying a home, it is clear that a lot of factors go into the decision making process, we decided to focus primarily on cost, more specific how the cost for buying a house varies not only between three different states but within each state, as well as when is the best time to buy. In doing so we have gained a better understanding of the housing market and more insight on the cost dynamic but we have also disregarded some other key aspects that go into the decision process when buying a house. We have performed these test with the expectation that we already knew we wanted to buy a home with 3-bedrooms, so those more direct factors are not what we are referring too, rather we understand that the geography of where we choose to buy often impacts whether we buy and sometimes even the price of the house. In the future, we may look to pin point which houses are within a certain mile radius of a specific branch or hospital, or maybe which houses are around a downtown area that has a fun/lively night life, or even create a colormap that plots crime rate in the last 5 years so as to be sure we choose a safe neighborhood.
[SOURCES]
“Housing Data.” Zillow Research, https://www.zillow.com/research/data/.
“United States Housing Bubble.” Investopedia, investopedia, 19 Nov. 2019, https://www.investopedia.com/terms/h/housing_bubble.asp.
“USA County Choropleth Maps.” USA County Choropleth Maps | Python | Plotly, https://plot.ly/python/county-choropleth/.